Skip to content

Conversation

@adithya-s-k
Copy link
Contributor

Summary

This PR adds support for BiGemma3 and ColGemma3 models based on the Gemma3-4B-IT backbone, enabling multilingual multimodal document retrieval.

Key Features

  • BiGemma3: Single-vector dense retrieval model with Matryoshka representation learning

    • Supports flexible embedding dimensions (768, 1536, 2560) at inference time
    • Efficient document retrieval with configurable accuracy/speed trade-offs
  • ColGemma3: Multi-vector late interaction model using ColBERT-style architecture

    • Fine-grained token-level matching with MaxSim scoring
    • 128-dimensional per-token embeddings

Changes

  • Added BiGemma3 model in colpali_engine/models/gemma3/bigemma3/

    • modeling_bigemma.py: Model implementation with Matryoshka support
    • processing_bigemma.py: Processor for images and text
  • Added ColGemma3 model in colpali_engine/models/gemma3/colgemma3/

    • modeling_colgemma.py: Multi-vector model implementation
    • processing_colgemma.py: Processor with MaxSim scoring
  • Added comprehensive tests in tests/models/gemma3/

    • Unit tests for model loading
    • Integration tests for forward pass
    • Retrieval tests with visual documents

API Design

BiGemma3 allows choosing embedding dimension at inference time:

from colpali_engine.models import BiGemma3, BiGemmaProcessor3

model = BiGemma3.from_pretrained(
    "Cognitive-Lab/NetraEmbed",
    torch_dtype=torch.bfloat16,
    device_map="cuda",
)

# Choose dimension at inference time
embeddings = model(**inputs, embedding_dim=1536)

ColGemma3 uses standard multi-vector late interaction:

from colpali_engine.models import ColGemma3, ColGemmaProcessor3

model = ColGemma3.from_pretrained(
    "Cognitive-Lab/ColNetraEmbed",
    torch_dtype=torch.bfloat16,
    device_map="cuda",
)

embeddings = model(**inputs)  # (batch, num_patches, 128)

Models

Related Work

adithya-s-k and others added 8 commits December 3, 2025 23:58
- Introduced BiGemma3 and BiGemmaProcessor3 for image and text processing.
- Added ColGemma3 and ColGemmaProcessor3 for late interaction retrieval.
- Implemented model and processor classes with appropriate forward methods.
- Created unit tests for BiGemma3 and ColGemma3 models and their processors.
- Ensured compatibility with existing Gemma3 architecture and added necessary processing utilities.
…mension validation and improved processor loading
- Implemented offline and online testing for BiGemma3 using Matryoshka embeddings.
- Created synthetic images and queries for testing across multiple dimensions (768, 1536, 2560).
- Validated image and query encoding, similarity scoring, and retrieval performance.
- Configured Modal app with necessary dependencies and environment settings.
- Added comprehensive logging and validation checks for test results.
- Implemented `serve_hf_snapshot.py` for HuggingFace model serving with optimized cold start and warmup.
- Introduced `serve_vllm_snapshot.py` for vLLM model serving with sleep mode and GPU memory snapshots.
- Added comprehensive benchmark report for inference performance in `INFERNECE_PERFORMANCE.md`.
- Both scripts support FastAPI endpoints for embedding generation and health checks.
- Configured deployment settings including GPU type, memory, and scaledown behavior.
…arameter and adjust forward method for validation
@ManuelFay
Copy link
Collaborator

Will look at this tmrw ! Thanks !

@athrael-soju
Copy link
Contributor

@adithya-s-k could you add some interpretability maps from your tests? Just checking how they differ from colmodernvbert and colqwen3

@adithya-s-k
Copy link
Contributor Author

@athrael-soju , i have pushed the code for the interpretability do check it out

Copy link
Collaborator

@ManuelFay ManuelFay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but can you please ruff the code and fix the tests so CI pass ?

@ManuelFay
Copy link
Collaborator

for ruff it's probably not even on you. the CI was disabled last week with the shai hulud bug so ruff checks didn't pass on one iof the merged PRs.
I'll merge right after !

@ManuelFay
Copy link
Collaborator

The gemma tests because of the model gating. Do you have a base model (like we do for all the supported architectures) that is initialized with the final projection ? Otherwise init will be randonm everytime if we start from gemma (+ the gating problem).

@adithya-s-k
Copy link
Contributor Author

Hey i have just pushed two model
These are not gated and should be easy to test they are just the base gemma model and for the col model , base model + the projection layers

also the final checkpoints can be used

@ManuelFay
Copy link
Collaborator

yeah looks good ! can you update the PR to include them so we run the CI again ?
you can also add the results of the final model in the readme (along with the link if you want)

@adithya-s-k
Copy link
Contributor Author

Hi @ManuelFay , I have made all the requested changes.
Updated the PR to include the ungated base models for both BiGemma3 and ColGemma3, fixed the model references in tests, and added the interpretability tests.

I have locally verified that the tests now run correctly with these models and everything looks good on my side.

@ManuelFay
Copy link
Collaborator

Awesome, I'll merge my linting PR and then merge yours ! Thanks a ton !

@ManuelFay
Copy link
Collaborator

can you rebase on main (will make the ruff CI happy) ? then we will merge !

Copy link
Collaborator

@ManuelFay ManuelFay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only 1 ruff error remaining. Then we can merge, rest looks nice !
Thanks again !

@adithya-s-k
Copy link
Contributor Author

@ManuelFay have fixed the ruff issue , i think everything should be set to merge the pr

@ManuelFay
Copy link
Collaborator

We should document more clearly how to ruff but the CI is still failing.

You need to run ruff format --check.

I approved and will merge right after

@adithya-s-k
Copy link
Contributor Author

@ManuelFay have run it locally and tested, it should pass all the checks now

@ManuelFay ManuelFay merged commit 8b6700f into illuin-tech:main Dec 30, 2025
6 checks passed
@ManuelFay
Copy link
Collaborator

Thank you for the contributions ! Don't hesitate to submit your model results on the MTEB visual retrieval leaderboard !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants